{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#Python文件读写\n",
    "\n",
    "##1、打开文件\n",
    "`f=open(\"c:\\\\pythoncoding\\\\test.txt\",'r')`\n",
    "\n",
    "如果文件不存在，open()函数就会抛出一个IOError的错误，并且给出错误码和详细的信息告诉你文件不存在：\n",
    "如果文件打开成功，接下来，调用read()方法可以一次读取文件的全部内容，Python把内容读到内存，用一个str对象表示：\n",
    "`f.read()`\n",
    "最后一步是调用close()方法关闭文件。文件使用完毕后必须关闭，因为文件对象会占用操作系统的资源，并且操作系统同一时间能打开的文件数量也是有限的：\n",
    "\n",
    "由于文件读写可能产生IOError，一旦出错，就不会调用f.close()，为了保证是否出错都能正确的关闭文件，可以使用try……finally来实现：\n",
    "```\n",
    "try:\n",
    "    f = open('/path/to/file', 'r')\n",
    "    print f.read()\n",
    "finally:\n",
    "    if f:\n",
    "        f.close()\n",
    "```\n",
    "引入with语句来自动调取close（）\n",
    "```\n",
    "with open('/path/to/file', 'r') as f:\n",
    "    print f.read()\n",
    "\n",
    "```\n",
    "此时，不用调用close（）。\n",
    "\n",
    "调用read()会一次性读取文件的全部内容，如果文件有10G，内存就爆了，所以，要保险起见，可以反复调用read(size)方法，每次最多读取size个字节的内容。\n",
    "\n",
    "另外，调用readline()可以每次读取一行内容，调用readlines()一次读取所有内容并按行返回list。因此，要根据需要决定怎么调用。\n",
    "\n",
    "要读取二进制文件，比如图片、视频等等，用'rb'模式打开文件即可：\n",
    "```\n",
    "f = open('/Users/michael/test.jpg', 'rb')\n",
    "f.read()\n",
    "'\\xff\\xd8\\xff\\xe1\\x00\\x18Exif\\x00\\x00...' # 十六进制表示的字节\n",
    "```\n",
    "**注意：非arcii码的文件读取，都需要读取成二进制文件，然后在转码**\n",
    "```\n",
    "f = open('/Users/michael/gbk.txt', 'rb')\n",
    "u = f.read().decode('gbk')\n",
    ">>> u\n",
    "u'\\u6d4b\\u8bd5'\n",
    ">>> print u\n",
    "测试\n",
    "```\n",
    "使用codecs模块自动完成读取的文件的解码\n",
    "```\n",
    "import codecs\n",
    "with codecs.open('/Users/michael/gbk.txt', 'r', 'gbk') as f:\n",
    "    f.read() # u'\\u6d4b\\u8bd5'\n",
    "```\n",
    "##2、写文件\n",
    "写文件和读文件是一样的，唯一区别是调用open()函数时，传入标识符'w'或者'wb'表示写文本文件或写二进制文件：\n",
    "```\n",
    ">>> f = open('/Users/michael/test.txt', 'w')\n",
    ">>> f.write('Hello, world!')\n",
    ">>> f.close()\n",
    "```\n",
    "**你可以反复调用write()来写入文件，但是务必要调用f.close()来关闭文件。当我们写文件时，操作系统往往不会立刻把数据写入磁盘，而是放到内存缓存起来，空闲的时候再慢慢写入。只有调用close()方法时，操作系统才保证把没有写入的数据全部写入磁盘。忘记调用close()的后果是数据可能只写了一部分到磁盘，剩下的丢失了。所以，还是用with语句来得保险：**\n",
    "```\n",
    "with open('/Users/michael/test.txt', 'w') as f:\n",
    "    f.write('Hello, world!')\n",
    "```\n",
    "**写入特定编码，原理与读取文件一样，使用codecs模块\n",
    "\n",
    "##3、创建文件\n",
    "```\n",
    "f=file('c:\\\\pythoncoding\\\\data\\\\myfile.txt','w') \n",
    "```\n",
    "注：'w'说明是写，即创建一个新的，如果该盘下有相同命名的文件夹，则被覆盖，  'wb'二进制的文件写\n",
    "      'r'说明是读，即读取文件  ‘r+'可以读写，'rb'二进制读\n",
    "      'a'说明是追加，即在文件尾追加\n",
    "```\n",
    "f.write(\"hello world\")\n",
    "f.close()\n",
    "```\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n"
     ]
    }
   ],
   "source": [
    "f=file('c:\\\\pythoncoding\\\\data\\\\myfile.txt','w')\n",
    "for i in range(10):\n",
    "    f.write('the loop %s' % i)  #注意file中的写入必须是字符串类型，其他的都不行，需要写入其他类型参见pickle Jason\n",
    "f.close()\n",
    "print f.closed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "read，readlines，readline\n",
    "read：一次性读取文件内容到一个字符串，在文件超过内存时，不适合使用\n",
    "readlines：同read，但自动将文件内容分析成一个行的列表，该列表可以由 Python 的 for ... in ... 结构进行处理\n",
    "readline：每次只读取一行，通常比 .readlines() 慢得多。仅当没有足够内存可以一次读取整个文件时，才应该使用 .readline()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['1 5 2 5 6\\n', 'ads dwre wwew 3rd\\n', '\\xd6\\xd0\\xb9\\xfa \\xcd\\xbc\\xc6\\xac \\xbc\\xc6\\xcb\\xe3\\xbb\\xfa\\n', '12 345  224  21323  4533\\n', 'wwe weew dd\\n']\n"
     ]
    }
   ],
   "source": [
    "f=open('c:\\\\pythoncoding\\\\data\\\\test file.txt','r')\n",
    "alist=f.readlines()\n",
    "print alist\n",
    "f.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "f = open('c:\\\\pythoncoding\\\\data\\\\test file.txt', 'w')\n",
    "# Writing data to file\n",
    "newdata=['1 5 2 5 6\\n','ads dwre wwew 3rd\\n']\n",
    "f.writelines(newdata)\n",
    "# Closing file\n",
    "f.close()\n",
    "\n",
    "#！！！！！！！！！！！！！！！！！！！\n",
    "#上面这种读方法，在文件特别大时，处理的效率会比较慢"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "arr=np.loadtxt('c:\\\\pythoncoding\\\\data\\\\np-loadtxt.txt')\n",
    "#print arr\n",
    "#保存至 np.savetxt('somenewfile.txt')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "numpy.loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0)\n",
    "Parameters:\n",
    "fname : file or str\n",
    "File, filename, or generator to read. If the filename extension is .gz or .bz2, the file is first decompressed. Note that generators should return byte strings for Python 3k.\n",
    "dtype : data-type, optional\n",
    "Data-type of the resulting array; default: float. If this is a structured data-type, the resulting array will be 1-dimensional, and each row will be interpreted as an element of the array. In this case, the number of columns used must match the number of fields in the data-type.\n",
    "comments : str or sequence, optional\n",
    "The characters or list of characters used to indicate the start of a comment; default: ‘#’.\n",
    "delimiter : str, optional\n",
    "The string used to separate values. By default, this is any whitespace.\n",
    "converters : dict, optional\n",
    "A dictionary mapping column number to a function that will convert that column to a float. E.g., if column 0 is a date string: converters = {0: datestr2num}. Converters can also be used to provide a default value for missing data (but see also genfromtxt): converters = {3: lambda s: float(s.strip() or 0)}. Default: None.\n",
    "skiprows : int, optional\n",
    "Skip the first skiprows lines; default: 0.\n",
    "usecols : sequence, optional\n",
    "Which columns to read, with 0 being the first. For example, usecols = (1,4,5) will extract the 2nd, 5th and 6th columns. The default, None, results in all columns being read.\n",
    "unpack : bool, optional\n",
    "If True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...). When used with a structured data-type, arrays are returned for each field. Default is False.\n",
    "ndmin : int, optional\n",
    "The returned array will have at least ndmin dimensions. Otherwise mono-dimensional axes will be squeezed. Legal values: 0 (default), 1 or 2.\n",
    "New in version 1.6.0.\n",
    "Returns:\n",
    "out : ndarray\n",
    "Data read from the text file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('f', 2, 3.0) ('m', 5, 6.0)]\n"
     ]
    }
   ],
   "source": [
    "arr=np.loadtxt('c:\\\\pythoncoding\\\\data\\\\np-loadtxt.txt', dtype={'names': ('gender', 'age', 'weight'),\n",
    "                     'formats': ('S1', 'i4', 'f4')})\n",
    "print arr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#二进制文件的读写\n",
    "data=np.empty((10,10))\n",
    "np.save('c:\\\\pythoncoding\\\\data\\\\test.npy',data)\n",
    "np.savez('c:\\\\pythoncoding\\\\data\\\\test.npz',data)\n",
    "#savez相对于save，在于对于大文件而言，savez在存储之前先压缩，但存储速度比较慢\n",
    "newdata=np.load('c:\\\\pythoncoding\\\\data\\\\test.npy')"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
